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Abstract: In the context of multiple regression model, suppose that the vector param¬ 
eter of interest j3 is subjected to lie in the subspace hypothesis Hf3 = h, where this 
restriction is based on either additional information or prior knowledge. Then, the 
restricted estimator performs fairly well than the ordinary least squares one. In ad¬ 
dition, when the number of variables is relatively large with respect to observations, 
the use of least absolute shrinkage and selection operator (LASSO) estimator is sug¬ 
gested for variable selection purposes. In this paper, we define a restricted LASSO 
estimator and configure three classes of LASSO-type estimators to fulfill both vari¬ 
able selection and restricted estimation. Asymptotic performance of the proposed 
estimators are studied and a simulation is conducted to analyze asymptotic relative 
efficiencies. The application of our result is considered for the prostate dataset where 
the expected prediction errors and risks are compared. It has been shown that the 
proposed shrunken LASSO estimators, resulted from double shrinking methodology, 
perform better than the classical LASSO. 
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1 Introduction 

Consider a linear regression model with form 

Yn = Xn(3 + Cm ( 1 ) 

where the data are drawn as {(**, Xi G and 1) G M for i = 1,2, ... ,n. XyS are the 

regressors and Yi is the response variable of the ith observation and /3 G is unknown vector of 
coefficients to be estimated, e„ is the vector term with £'(e„) = 0 and A(e„e^) = < oo). 

In is the identity matrix of order n. 

In general, the main goal of the multiple regression model Q is the estimation of parameters 
and the prediction of response for a given design matrix. The estimation problem is usually 
solved through ordinary least squares (OLS) method where the parameters are estimated by the 
values minimizing the residual sum of squares \\Yn — X„/3||| = — xj Provided A„ 
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is of full rank, such that X'^Xn is nonsingular and can be inverted, by the least squares method, 
the estimator of j3 is written as 

= {XlXn)-^XlYn = C-^XlYn, Cn = 

The corresponding estimator of cr^ is 

sl = -{Yn-XnPnf XnPn)'^ m = U - p. (2) 

m 

~ 2 
It is obvious that (3^ ~ Xp{(3, a‘^C~^) independent of the distribution of which has a central 

chi-square distribution with m d.f. 

The standard procedures rely on the assumption that Cn is nonsingular, otherwise Cn can¬ 
not be inverted and the parameters cannot be uniquely estimated. The OLS regression method 
finds the unbiased linear combination of the Xn that minimizes the residual sum of squares. 
However, if p is large or the regression coefficients are highly correlated (multicollinearity), the 
OLS may yield estimates with large variance which reduces the accuracy of the prediction. Pos¬ 
sible solutions can be (1) variable selection, for example best subset selection (Miller, 2002), (2) 
dimension reduction techniques for example, principle component regression or partial linear re¬ 
gression; and (3) regularization such that ridge (Hoerl and Kennard, 1970), LASSO (Tibshirani, 
1996), SCAD (Zou and Hastie, 2005), elastic net (Fan and Li, 2001), and etc. 

Variable selection is a method that there are p input variables, the objective is to select opti¬ 
mal model among all possible models. The most intuitive approach is maybe through preselction 
or subset selection. That is, to simply pick out a smaller subset of the covariates based on a 
certain relevant criterion and fit the (standard) model to these covariates only. This approach 
(all possible models) is computationally infeasible, when p is large (say, larger than 100). There 
exist heuristics to cope with this problem such as forward selection, backward elimination or 
stepwise, but they are still unstable. This procedure means that small changes of data result 
in large change of the estimator (Breiman, 1996). This method uses a hard decision rule (a 
variable survives or it dies). 

The second approach was to use methods like principal components regression or partial least 
squares. These methods derive a small number of linear combinations of the original explanatory 
variables, and use these as covariates instead of the original variables. This may be reasonable 
for prediction purposes, but models are often difficult to interpret (Hastie et ah, 2009). 

Regularization methods are promising alternative. In these methods, coefficients are shrink¬ 
age rather than subset selection’s result estimator. This process is more continues, and then we 
get lower variance than by subset selection and also reduce the prediction error of the full model. 
Shrinkage often improves prediction accuracy, trading off decreased variance for increased biased 
discussed in Hastie et al. (2009). These are also called shrinkage methods because they shrink 
the regression coefficients toward zero. The other name of this method is “penalized regression 
methods” or more general, “sparse regression”. 

A general sparse regression minimize the criterion 

/(/3) + f]P,(|/3,|;A), 

where /(•) is a differentiable loss function (in linear regression usually /(/3) = \ \Yn — XnfiW^^), 
P{■',■) is a penalty function, A is a tuning parameter and p index a penalty family. For example, 
we can refer to a power family, bridge regression (Frank and Friedman, 1993) as below: 

P^(|/3|,A) = A|/3r; 7 ?g[0,2]. 


2 



If ?? G [0)1]) then P^(|/3|,A) is concave, and it is a convex function if 77 G [1)2]. Some special 
cases are (1) r] = 0, best subset regression, ( 2 ) rj = 2, ridge regression and (3) ?? = 1, LASSO 
regression. 

Elastic net family (Zou and Hastie, 2005) is 

Prj{\l3\,X) = A|(7?-l)y+ (2-r7)|/3||; 7 ? G [1,2]. 


In this family, 77 = 1 result in the LASSO estimator and the ridge estimator is obtained by 
considering 77 = 2 . 

Fan and Li (2001) defined the SCAD family as 


Pvm^) 


A|/3| 

X^(ri+l) 


|/3| <A 
l/3|e[A,.,A] 
|/3| > 77 A 


For small signals |/3| < A, it acts as LASSO, and for large signal \j3\ > rjX, the penalty flattens 
and lead to the unbiasedness of the regularized estimate. 

Among all of the above regressions, the most famous is ridge. A disadvantage of this estimator 
is that the interpretation is not easy since the model includes all input variables. The LASSO is 
another important method. The Li penalty is used in LASSO while in ridge the L 2 is used. This 
tiny difference makes quantitative gaps practically as well as theoretically. The LASSO penalty 
shrinks each Pj toward the origin and push irrelevant predictors to exact zero. Indeed, The 
LASSO can do variable selection and shrinkage estimation simultaneously. One very interesting 
property of LASSO is that the predictive model is sparse (i.e. some coefficient are exactly zero). 

All regularization methods depend on one or more tuning parameters controlling the model 
complexity. Choosing the tanning parameters is an important part of the model fitting and is 
critical in statistical applications. There are two common used methods; (1) cross-validation, 
and ( 2 ) information criteria AIC and BIC offer good practice performance. 

In this paper, we focus only on the LASSO method. In the forthcoming section, we briefly 
introduce the LASSO estimator. 


1.1 LASSO Estimator 


Tibshirani (1996) proposed a new method for variable selection that produces an accurate, stable, 
and parsimonious model called LASSO (Least Absolute shrinkage and Selection Operator). 
The LASSO is a constrained version of OLS. Due to the sparseness property of the Li norm, 
the LASSO has been received much attention in recent years (Xu, 2014). The “LASSO” of 
Tibshirani (1996) is a least-squares problem regularized by Li norm, where we solve the following 
optimization problem 

^ f n p 'I p 

j3 = min^ ^(y* i subject to ^ \f3j\ < t, (3) 

[i=i j=i ) i=i 

where t is a constant. If t = 0, the model includes only the intercept term while the model 
becomes the full model when t = 00 . If t \ldj\ where f3* is the initial estimator of (3 

that usually it is considered as /3„, the OLS estimator, then the LASSO algorithm will yield the 
same as OLS estimate. However, if 0 < t l/^il) then the problem is equivalent to 

p \ ^ ^1 

Hi — ^ Pj^ij 1 + An ^ \/3j\ > , An > 0, (4) 

i=i / i=i J 


j3 = argmin 


E 

2 = 1 
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where is a tunning parameter, controlling the level of sparsity in (3 . The relation between 
\n and LASSO parameter t is one-to-one. 

1.2 Notation 

The following notation is used throughout the paper. For any dimension d, bold face letters 
denote vectors and normal face their elements, e.g. v = {vi,V 2 , ■ ■ ■ ,VdY' ■ Capital face letters 
denote matrices, e.g. X and S. Let A = (ajj) be an m x m matrix, then denoted the 
transpose of A, tr(A) = on -|- 022 amm is the trace of matrix A, A~^ = (a*-^). 

The Lg norm of u is ||u||g = for g > 0 and | • | to represent absolute value, 

applied. Design vectors, or columns of X, are denoted by Xj. 

We will write minu (maxu) to denote the minimum (maximum) component of a vector v. 
Also, argmin/(-) (argmax/(-)) is that component of the function’s support which it result in 
minimum (maximum) value of /. 

For notational convenience, we use l>p(-;/x, S) and to indicate the c.d.f. and the 

p.d.f. of the p-variate normal distribution with mean and covariance matrix S, respectively. 

denotes the c.d.f. of the ^^-distribution with degree of freedom v and non centrality 
parameter A^/2. 

Throughout, we use the following identities: 

E[x-l{^'^)]=Er{q + s-2 + 2r)-\ 

E [XgU^^)] =Er[{q + s-2 + 2r){q + s-4 + 2r)]-^ 

E [xgUA-^Aixl+si^^) < k)] =Er{q + s-2 + 2r)-^ Hg+s- 2 + 2 r{k-, 0), 

E [x-UA‘^)Hxl+si^^) < k)] = Eriiq + s - 2 + 2r){q + s - 4 + 2r)]-^Hg+,_^+ 2 rik; 0), 

where Er stands for the expectation with respect to a Poisson variable r with parameter A^/2, 
and I (A) is the indicator function of set A. 

Finally, —)• and —)• are used to show convergence in probability and distribution, receptively. 
We organize the paper as follows: In section a restricted LASSO estimator will be defined 
for inference under constraint and concept of double shrinking is introduced. Section contains 
asymptotic distributions of the proposed estimators. In section a simulation study is con¬ 
ducted to analyze the relative efficiencies of the estimators, while an application of the results 
is considered for the well-known prostate dataset, where we compare expected prediction errors 
and asymptotic risk values. 

2 Restricted LASSO and Double shrinking 

Up to this point, it was assumed that the level of information had depend on the sample, 
assuming no non-sample effect in estimation procedure. In this sense, we denote a LASSO 
estimator of /3 by and term it as unrestricted LASSO estimator (ULE). 

However, in some situations it is possible to have some non-sample information (a priori 
restriction on the parameters) usually subjected to the model as constraints. 

A set of q linear restrictions on the vector (3 can be written as FI/3 = /i. Or we can suppose 
that our model is subjected to lie in the linear sub-space restriction 

Hf3 = h, (5) 

where H is a q x p (q < p) matrix of known elements, with q being the number of linear 
restriction to test, and h is a q x 1 vector of known components. The rank of H is g, which 
implies that the restrictions are linearly independent. This restriction may be (a) a fact known 
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from theoretical or experimental considerations, (b) a hypothesis that may have to be tested or 
(c) an artificially imposed condition to reduce or eliminate redundancy in the description of the 
model (see Sengupta and Jammalamadaka, 2003). 

In this context, the LASSO estimator which satisfies 0 will be called the restricted LASSO 

^ RL/ 

estimator (RLE), denoted by . By the analogy of OLS estimator of (3, subject to the 
restriction H(3 = h, we propose 

= pt- - h). (6) 

When ([^ is satisfied, [3^ has smaller asymptotic risk than 0^; However, for Hf3 ^ h, 

may be biased and inconsistent in many cases. For this reason, it is plausible to follow Fisher’s 

L RL 

recipe and define a preliminary test LASSO estimator (PTLE) by taking /3„ or (3^ according 
to acceptance or rejection of the null hypothesis 

rLo-.H(3 = h. 


This estimator will have the form 

Pn ~ Pn ~ iPn ~ Pn < ^n,a)i 


(7) 


where Cn,a is the upper a-level critical value of the exact distribution of the test statistic Cn 
under "Rq. There will be two proposals for the test statistic Following Saleh (2006) or Saleh 
et al. (2014), the test statistics is given by 


si 

However, this test can be constructed upon the LASSO estimator. Here, we use the test in Q. 
We believe that incorporating a test based on the LASSO estimator in analytical computations 
makes everything more easier. 

The PTLE is highly dependent to the level of significance a and has discrete nature which 

'' L '' RL 

simplifies to one of the extremes /3„ or /3„ according to the output of the test. In this respect, 
making use of a continuous and a-free estimator may make more sense. Now, we propose double 
shrinking idea which reflects a relevant estimator. It is well-known that the LASSO estimator 
shrinks coefficients toward the origin, however, when the restriction HP = h is subjected to the 
model, it is of major importance that the estimator is shrunk toward the restricted one as well. 
Hence, there must be shrinking toward two directions or double shrinking, say. Consequently, 
we combine the idea of James-Stein (1961) shrinkage and LASSO to propose the following Stein- 
type shrinkage LASSO (SSLE) as 


^SSL ^RL 

Pn =Pn-kn[Pn-Pn )P 


-1 
n ’ 


m{q — 2 ) 
(m -|- 2 ) 


(9) 


where kn is the shrinkage constant. 

The estimator may go past the estimator p^ . Thus, we define the positive-rule Stein- 
type shrinkage LASSO estimator (PRSSLE) given by 


P 


PRSSL 


PT + {1 - knC-^}I{Cn > kn)CPn ” PT), 

- (1 - knC-pI{Cn < kn){Pn " PT)- 


( 10 ) 


We note that, as the test based on Cn is consistent against fixed P such that HP h, the PTLE, 
SSLE and PRSSLE are asymptotically equivalent to the ULE for fixed alternative. Hence, we will 
investigate the asymptotic risks under local alternatives and compare the respective performance 
of the estimators. 
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3 Asymptotic Distribution of the Estimators 

In sequel, the following regularity assumptions will be needed. 

Al: maxi<j<„ xfC^ ^Xi —)■ 0 as n —)■ oo where xf is the ith row of design matrix X. 

A2: lim„^oo n~^Cn = C, where C is finite and positive-definite matrix. 

And also we need to notice the restriction HfB = h is not exact; rather, it is of the form 
HjB = h + ^. In many situations, the asymptotic distribution of ^/ns~^(/3^ — (3) is equivalent 
to the — (3) distribution as n —)• oo under fixed alternatives, 

K^: H13 = h + t 

where f3* is an estimator of l3. Then, to obtain the asymptotic distribution of ^/nsl{j3*^ — (3), 
we consider the class of local alternatives, defined by 

: Hl3 = h + n“2^. 

Now, let the asymptotic cumulative distribution function (c.d.f.) of ^/ns~^{(3*^ ~ P) under 
be 

Gp{x) = lim Pif {^/ns~^{/3*^ - P) < x} 

If the asymptotic c.d.f. exists, then the asymptotic distributional bias (ADB) and quadratic 
bias (ADQB) are given by 

6(/3*) = lim E \^/n{Pn — P)] = f xdGp{x) 

n^oo 

and 

B{Pl) = a-^[b{P*)fG[b{P*J] 

respectively, where a‘^G~^ is the MSE-matrix of /3„ as n —)• oo. Defining 

MiPn) = [ xx^dGp{x) = lim E [n{Pl - /3)(/3; - p)'^] , 
as the asymptotic distributional MSE (ADMSE), we have the weighted risk of /3* given by 
R{Pl) = lx[M{Pl)] = lim E[n{Pl - {PI - P)] 

n^oo 

as the asymptotic distributional quadratic risk (ADQR). 

Eor the proof of all following results, refer to the Appendix. 

Theorem 1 Under : HP = and the regularity assumptions, we have the following 

as n ^ CO, 

(i) If C is a nonsingular matrix and Xn/n —)■ Aq > 0, then —)• argmin{Z) where 

p 

z{p) = {p- pfGip -(3 )+xoY, m- 

i=i 


(ii) Pn^ ^ argmin{Z) — G ^H'^{HG ^{Hargmin{Z) — h). 

(hi) - p^^ 4 G-^H^{HG-^H^)-\Hargmin{Z) - h). 
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^ PTT T> 

(iv) /3„ -4 argmin{Z) — C~^{HC~^{Hargmin{Z) — h)I{L < La) where L = 

a~‘^{HW + and La is the upper critical value of chi-squared 

distribution with q d.f. 

(iv) ^ argmin{Z) — kC~^{HC~^{Hargrain{Z) — h)L~^ 

PRSSL T) 

(v) 4 argmin{Z) - {kL-^ + (1 - kL-^)I{L < k)} C-^H'^{HC-^H'^)-\Hargmin{Z) - h) 

Theorem 2 Under the class of local alternatives, and regularity assumptions, we have 

the following as n ^ oo, 

(i) 4^(^„-/3)~iV4o,4c-i). 

(ii) // ^ —)■ Ao > 0 and C is a nonsingular matrix, then 

\/n{0^ — /3) —)• argmin{V) 


where 


p 

V{u) = —2u^W + uFCu + Ao 'y^\ujsgn{l3j)I{l3j / 0) + \uj\L{l3j = 0)], 

i=i 


andW Np{0,a^C). 

(iii) — /3) 4 argminiV) — C~^IL^{HC~^{Hargmin{V) + ^). 

(iv) 4^(/3^ - 4 C-^H'^{HC-^H^)-^{Hargmin{V) + i) 

(v) lim„_^oo-P(in < x) = Hq{x;A‘^) where Hq{-;A‘^) is the c.d.f. of non central chi squared 

distribution. 

^ p'TJk 7 ^ 

(vi) ^/n{l3 — /3) —^ argmin(V) — C~^{HC~^{Hargmin{V) + ^)I{L < La). 

(vii) — (3) ^ argmin(V) — kC~^H^{HC~^H'^)~^{Hargmin{V) $)L~^ , k = {q — 2). 

(viii) - /3) 4 argmin{V) - [kL-^ + (1 - kL-^)L{L < k)} C-^H^(HC-^H^)-^ 

X {H argminiy) + ^) 

3.1 Null-Consistent Estimators 

1 

In this section, suppose the LASSO is weakly consistent, i.e., \n = o{n‘i). Up to this point, 
we implemented a test statistic based on the OLS estimator, however, constructing a test based 
on the LASSO estimator will give the same asymptotic behavior in our setup. A test statistic 
based on the LASSO estimator will have form 

^ {HPn - l3)^{HC-^Hy-\Hp^^ - 13) 

4 

where 

4 = —(E-A^^)^(y-X/3^), m = n-p. (12) 

m 
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Theorem 3 Under regularity assumptions and also in the class of local alternatives Ln, 

the likelihood ratio test statistics, converges in distribution to L which has the non central 
chi square distribution with q d.f. and non centrally parameter = 

a~‘^S^CS where 6 = C~^H^and it defined as 

^ {HW+ + ^) 

WhereW N{0,a‘^C). 

Theorem 4 In theorem^ if = o{n), we have the following results, 

(i) argmin{Z) = (3 and so is consistent. 

(ii) ^(3-6; 6 = - h). 

(hi) P - 5I{L < La). 

(iv) Pn^ ^ P- SL-^. 

(v) p^^^^ ^p- {kL-^ + (1 - kL-^)I{L <k)}S. 

Remark 1 According to Theorem\^ under Hq, all estimators are consistent for p. 


In all the following results, proofs are directly deduced using the utilities in Saleh (2006) after 
some algebra. 


Theorem 5 In Theorem^ if Xn = opi'i), we have the following results: 

(i) W = VniPn- f3) ^ Np{0,a^C-^). 

(ii) = ./hCpf -p)^ Np{0,a^C-^). i.e. = W. 


(hi) lyf^ = ^{PT -P) ^ Np{-S,a^A) where S = and 

A = C-^ - 


■RL 


V 


(iv) wP = y/h{P^ - p^^) ^ Np{S,a‘^{C ^ - A)) 

(v) = Hp^ -h^ NfiHp - h, a^HC-^H^)), 

(vi) 

(vh) 

(vhi) 


kpl 

'D T^T ( 

0 

9 

c-i c-i 

-A 


LI2p[ 

8 

,0- 

C-^-A C-^ 

-A 



AT f 

■ 8 

2 

'A 0 

wi^[ 

N2p 

-8 

,0- 

0 C-^-A 




V 


N, 


P+Q 


0 

HP-h 




■ ^_i^T ■ 


(ix) y/n{P^^^ — P) = W — k 


C-^h'^{hc-^h'^)-Uhw+^) ) 

a-2{HW+i)T{HC-^HT)-i{HW+0 j ■ 






















(x) 


^ 0 PRSL_^^ V 


{a-2(ffW + $)^(ffC-^ff^}-^(HW + ^) 

+C-^ff^(ffC-^ff^)-^(ffW + ^) 

r_ k _' 

^ \ a-^{HW + ^)T{HC-^HT)-^HW + $), 


liL < k). 


Lemma 6 (Saleh, 2006) We have the following identities: 

(i) E Xg+ 2 (^^) = exp-^ x;r >0 n (^) ^ = ^r[{q + 2r)-i] where Er stands for the 

expression with respect to the Poisson variable r with mean ^. 

(ii) ^;[x“+2(^^)] =exp-^Er>0^ (¥) iq+2r){l-2+2r) = ^r[{q + 2r)(g - 2 + 2r)]-\ 

(Hi) E xf+ 2 i^^)kixli^^) < c) = Er[iq + 2r)-^]Hq+2ric]0), 

(iv) E x“+ 2 (^^)-^(Xa(^^) < c) = Er[iq + 2r)-^]Hg+2ric]0). 

Theorem 7 Suppose that all estimators are ^/n-consistent. Then, the ADB, ADQB, ADMSE 
and ADQR of the estimators are given by 

(i) fei(/9n) = 0, Bi{f3^) = 0, Ri0n'^W) = tr(lTC'“^), and Mi{f3^) = 

{ii) b2{pT) = -‘5, B2Cpf^) = A2, R2{Pn'^; VF) = CT^ iT{W{C-^ - A)) + 6^W6, 

and M 2 {^n^) = a‘^{C~^ — ^) + S6^, 

/v P'TT /V P'VT 

(m) bsi(3^ ) = -SHg+2{xl{o:y,A^), B^{(3^ ) = A\Hg+2{xl{oiy A^)Y, 

^ P'J^T 

RsiPn ;W) = a^ tr(WC-^) - tr(W(C-^ - A))ffg+ 2 (Xg(a); A^) + S^WSZ(a; A^), 

^ P'T' T 

Ms{(3^ ) = a^C-^ - a\C-^ - Fl)/7,+2(x'(«); A^) + SS'^Zia; A^), 

^ r 1 

(iv) b^ifd^ ) =—kSE Xo+ 2 (A^) , where k= lim kn = q — 2, 

BS'n'^) = k^A^[E[xfU^^)]}, 

RiCfil^-, W) = tr(VFC-^) - ka"^ tr(C-i - A)X{A^) + k{k + A)5'^WSE rx“+4(A2)j , 
M40f^) = - ka‘^{C-^ - A)X{A^) + + 4)66^E [x;^4(A2)] , 

(i;) 65(/3n ) = &4(/3„ ) - 8E [(1 - kx-gUA^mxl+2{A^) < k)] , 

i?50r"') = [kE [x-+'2(A')] -e[{ 1- kxfU^^))I{xl+2iA^) < A:)] }' , 

■ VF) = i?4(/3„ ■,W)-a^ tr(C-^ - A)E [(1 - fcx-+'2(A'))'/(x'+2(A') < k) 
-S^WSQ(A^), 

M5(/3„ ) = M4(/3„ ) - aHC-^ - A)E [(1 - kxfU^^))^nxl+2iA^) < k) 

-SS^Q{A^). 


where 


Z{a-A^) = 2F,+2(x"(a);A2)-/7,+4(Xg(«);A2), 
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X(Af = 2E 


-kE 





Q(Af = 2E 


< k) 

-E 

\l-kx-U^f)I{xff^f<k) 


3.2 Graphical Representations 

In this section, some graphical illustrations will be provided for asymptotic distributional quadratic 
risk functions. For our purpose, we assume p = A, q = 3, 


(3 =[10 -11]^, 


H = 


1-13 1 
3 2 10 

4-205 


/i = [0 0 0]'^ and ^ = [11 if 


From Figures 1 & 2, it can be deduced that all proposed estimators namely restricted LASSO, 
preliminary test LASSO, Stein-type Shrinkage LASSO and its positive part estimators perform 
better than the LASSO estimator in the sense of having smaller ADQR. It is also evident that 
as we deviate from the null-hypothesis, ADQR values get larger. Finally, as one may expect, by 
decreasing the level of significant, the preliminary test LASSO estimator performs better. 

We also depicted the ADQR functions for different p and q values and found no substantial 
change in performances. 
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p=5, q=3, a=0.01,5^=1 p=5, q=3, a=0.05,5=1 






p=5,q=3,a=0,2,a^=1 p=5,q=3,a=0.25,a^=1 




Figure 1: ADQR functions for cj^ = 1, and different level of significance a. 
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ADQR ADQR ADQR 


p=5,q=3,a=0,01,(1^=10 


p=5,q=3,a=0,05,5^=10 



Figure 2: ADQR functions for a"^ = 10, and different level of significance a. 
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4 Simulation 


In this section, we conduct a Monte Carlo simulation to analyze relative efficiencies with respect 
to different level of sparsity. 

We generate X-matrix from a multivariate normal distribution with mean vector /i = 0 and 
covariance matrix S. The off-diagonal elements of the covariance matrix are considered to be 
equal to r with r = 0, 0.2,0.9. We consider n = 100 and various p ranging from 10 to 30. 

In our simulation scheme, is a p-vector and a function of A^. When = 0, /3 is the null 
vector. A^ > 0 is equivalent to “violation” of the null hypothesis. We considered 9 different 
values for A^, which are 0,1, 2, 3, 5,10, 20,30, 50. The way the /3 vector is defined in our setup, 
a A^ indicates that data are generated under null hypothesis, whereas A^ > 0 indicates a data 
set generated under alternative hypothesis. Each realization was repeated 2000 times to obtain 
bias-squared and variance of the estimated regression parameters. 

Finally, risks are calculated for the ULE, RLE, PTLE, SSLE and PRLE. The responses were 
simulated from the following model; 

p 

Pi = '^Xi/3i +a, ei~A(0,5^) 
i=l 

A. J_^ A' ^ A. 

Relative efficiencies are calculated as Risk(/3 )/Risk(/3 ), where /3* is one of the estimators 
whose relative efficiency is to be computed. 

For comparing the relative efficiencies of the penalty estimators, the data generation setup 
was slightly modified to accomodate the number of nonzero /3s in the model. In particular, 
we partitioned /3 as /3 = {k, where k indicates number of nonzero /3s and q indicates p — k 
zeros-a function of A^. To translate the above, when p = 10 and k = 5, we would have /3 = 
(1,1,1,1,1, 0, 0, 0, 0, 0)^, and the previously mentioned procedures would be used to generate 
the data. 

From Tables 1-3, it can be realized that the PRSSLE has the best performance among all. 
As a numerical proof for the assertion in graphical representation, when we deviate from the null 
model, neither PTLE nor SLE dominates one another and the PTLE performs better as a gets 
larger. Relative efficiency of the proposed estimators increases when there are more near-zero 
parameters present in the model. Performance of the estimators decrease as we deviate from 
the null model. 
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Table 1: Relative efficiencies of the estimators for 


= 0, different values of p and k. 



1 ULE 1 

RLE 

PTLE 1 

SSLE 

PRLE 


k = 1 

k = 3 

k = 4 

k = 5 

k = 6 

k = p 


0.15 

0.20 

0.25 1 



p 






: 

- 0 






10 

23.42 

15.97 

17.12 

19.13 

19.54 

19.49 

96.41 

24.57 

22.91 

21.25 

39.41 

39.42 

20 

43.80 

46.44 

47.19 

43.61 

47.33 

44.66 

347.55 

64.67 

58.41 

50.63 

144.03 

144.11 

30 

112.44 

100.43 

89.37 

119.88 

92.40 

119.22 

1181.14 

706.06 

515.65 

336.90 

876.35 

934.60 

P 






: 

= 1 






10 

13.43 

9.77 

10.41 

8.94 

8.30 

6.98 

9.19 

7.55 

7.38 

7.20 

8.17 

8.18 

20 

38.07 

26.55 

24.00 

23.79 

22.79 

11.09 

13.42 

11.72 

11.56 

11.33 

12.88 

12.88 

30 

76.32 

54.95 

51.71 

49.40 

44.92 

17.77 

20.60 

20.34 

20.07 

19.75 

20.50 

20.51 

P 






A^ : 

= 2 






10 

10.30 

8.91 

7.55 

7.41 

6.41 

5.53 

7.13 

5.97 

5.81 

5.68 

6.41 

6.41 

20 

27.23 

21.96 

19.14 

17.79 

17.35 

10.21 

12.44 

10.69 

10.51 

10.34 

11.91 

11.91 

30 

50.46 

42.27 

45.57 

39.07 

35.93 

16.35 

18.92 

18.52 

18.25 

17.93 

18.81 

18.81 

P 






A"" : 

- 3 






10 

6.64 

6.39 

6.06 

5.40 

5.19 

4.74 

5.83 

4.95 

4.88 

4.81 

5.34 

5.35 

20 

17.34 

15.74 

14.55 

14.00 

13.44 

8.96 

10.45 

9.36 

9.23 

9.08 

10.09 

10.10 

30 

37.58 

34.27 

31.82 

30.83 

28.71 

14.85 

17.05 

16.84 

16.72 

16.32 

16.95 

16.97 

P 






A^ : 

- 5 






10 

3.97 

3.77 

3.50 

3.58 

3.38 

3.19 

3.53 

3.27 

3.24 

3.23 

3.39 

3.40 

20 

9.42 

8.86 

8.43 

8.46 

8.15 

6.42 

7.02 

6.60 

6.55 

6.47 

6.88 

6.88 

30 

20.18 

18.33 

17.84 

18.14 

17.22 

10.97 

12.02 

11.99 

11.89 

11.70 

11.99 

11.99 

P 






A^ - 

- 10 






10 

1.95 

1.93 

1.80 

1.90 

1.88 

1.84 

1.93 

1.87 

1.86 

1.85 

1.89 

1.89 

20 

3.72 

3.66 

3.68 

3.65 

3.55 

3.33 

3.54 

3.38 

3.36 

3.34 

3.49 

3.49 

30 

6.93 

7.09 

6.99 

6.87 

6.88 

6.18 

6.53 

6.49 

6.47 

6.43 

6.51 

6.51 

P 






A^ - 

- 20 






10 

1.35 

1.32 

1.35 

1.33 

1.36 

1.36 

1.38 

1.37 

1.36 

1.36 

1.37 

1.37 

20 

1.99 

2.01 

2.01 

1.97 

2.00 

1.97 

2.00 

1.98 

1.98 

1.97 

1.99 

1.99 

30 

3.22 

3.25 

3.21 

3.25 

3.30 

3.25 

3.32 

3.31 

3.31 

3.30 

3.32 

3.32 

P 






A^ - 

- 30 






10 

1.23 

1.22 

1.21 

1.23 

1.22 

1.23 

1.24 

1.24 

1.23 

1.23 

1.24 

1.24 

20 

1.63 

1.65 

1.64 

1.66 

1.61 

1.68 

1.71 

1.69 

1.69 

1.69 

1.70 

1.70 

30 

2.47 

2.54 

2.54 

2.52 

2.52 

2.48 

2.51 

2.51 

2.50 

2.50 

2.51 

2.51 

P 






A^ - 

- 50 






10 

1.17 

1.17 

1.18 

1.16 

1.18 

1.16 

1.17 

1.17 

1.17 

1.17 

1.17 

1.17 

20 

1.49 

1.53 

1.47 

1.50 

1.49 

1.51 

1.53 

1.52 

1.52 

1.51 

1.52 

1.52 

30 

2.14 

2.13 

2.08 

2.11 

2.09 

2.15 

2.18 

2.17 

2.17 

2.17 

2.18 

2.18 


Table 2: Relative efficiencies of the estimators for 


= 0.2, different values of p and k 





ULE 


RLE 

PTLE 

SSLE 

PRLE 

k = 1 

k = 3 

k = 4 

fc = 5 

fc = 6 

k = p 0.15 

0.20 

0.25 


A^ = 0 

20.64 

20.21 

16.21 

18.22 

20.23 

18.14 73.26 23.72 

21.20 

19.70 34.63 

34.91 

44.37 

47.21 

45.70 

53.74 

44.38 

50.79 389.22 65.82 

59.01 

54.34 162.41 

163.14 

108.61 

120.92 

78.55 

116.06 

86.10 

114.17 673.36 477.85 

442.09 

289.67 576.68 

620.72 

A^ = 1 

14.67 

12.02 

9.59 

9.72 

9.03 

7.38 10.33 8.02 

7.77 

7.65 8.98 

9.02 

35.14 

29.74 

26.11 

27.71 

23.27 

13.62 17.33 14.75 

14.30 

14.05 16.54 

16.54 

83.94 

72.43 

51.38 

59.23 

46.21 

24.32 29.79 29.17 

28.94 

27.99 29.55 

29.68 

A^ = 2 

11.13 

8.78 

8.72 

8.53 

7.66 

6.29 8.48 6.77 

6.58 

6.46 7.45 

7.45 

30.16 

25.60 

22.51 

21.76 

21.15 

12.71 15.62 13.68 

13.30 

12.90 14.99 

15.00 

63.08 

49.59 

51.37 

45.83 

43.62 

23.52 28.94 27.87 

27.32 

26.72 28.62 

28.70 

A^ = 3 

7.19 

6.77 

6.55 

6.38 

6.25 

5.75 6.69 6.03 

5.93 

5.85 6.30 

6.31 

19.67 

17.27 

17.39 

15.67 

15.72 

11.02 13.38 11.56 

11.43 

11.19 12.86 

12.86 

42.76 

40.57 

40.57 

34.44 

33.20 

20.80 24.50 24.13 

23.93 

23.04 24.33 

24.40 




3 

2.02 

2.03 

2.09 

2.06 

2.03 

2.11 

2.06 

2.0 

2 

4.08 

4.22 

4.05 

4.28 

4.34 

4.61 

4.42 

4.4 

3 

8.35 

8.40 

8.49 

8.43 

9.28 

9.91 

9.78 

9.7 






A^ - 

20 



1 

1.38 

1.43 

1.41 

1.44 

1.42 

1.44 

1.42 

1.4 

5 

2.21 

2.24 

2.19 

2.25 

2.45 

2.50 

2.46 

2.4 

5 

3.74 

3.64 

3.91 

3.79 

4.61 

4.76 

4.74 

4.7 






A^ - 

30 



8 

1.26 

1.25 

1.26 

1.28 

1.42 

1.44 

1.42 

1.4 

3 

1.82 

1.77 

1.79 

1.79 

2.45 

2.50 

2.46 

2.4 

3 

2.82 

2.91 

2.89 

2.90 

4.61 

4.76 

4.74 

4.7 






A^ - 

50 



9 

1.18 

1.20 

1.21 

1.20 

1.21 

1.22 

1.22 

1.2 

0 

1.58 

1.63 

1.61 

1.64 

1.65 

1.67 

1.66 

1.6 

6 

2.42 

2.47 

2.45 

2.41 

2.65 

2.69 

2.68 

2.6 









Table 3: Relative efficiencies of the estimators for fixed A^, r = 0.9, different values of p and k. 



ULE 

RLE 

PTLE 

SSLE 

PRLE 


k = 1 

fc = 3 

k = 4 

k = 5 

fc = 6 

k = p 


0.15 

0.20 

0.25 



p 







- 0 






10 

20.82 

20.22 

16.11 

19.35 

21.15 

19.03 

123.65 

23.36 

20.80 

20.00 

38.20 

38.74 

20 

45.05 

46.89 

46.75 

53.90 

44.56 

50.71 

632.12 

67.21 

60.49 

53.35 

173.14 

175.40 

30 

108.96 

118.30 

79.47 

115.30 

85.65 

114.00 

1172.26 

759.67 

585.21 

317.76 

826.66 

912.68 

P 






- 

= 1 






10 

21.52 

17.79 

21.27 

21.48 

17.66 

15.04 

53.45 

18.09 

16.77 

15.79 

25.53 

26.14 

20 

53.00 

59.20 

47.68 

49.57 

47.64 

42.22 

140.71 

51.69 

45.55 

43.08 

94.46 

94.96 

30 

104.08 

92.84 

98.95 

96.73 

90.50 

85.50 

344.20 

226.35 

208.81 

172.24 

298.30 

306.38 

P 







= 2 






10 

15.59 

15.52 

16.25 

16.19 

19.80 

21.24 

52.47 

26.07 

22.69 

21.51 

32.62 

33.01 

20 

47.47 

42.76 

46.90 

39.31 

47.51 

40.03 

134.72 

47.79 

44.32 

41.59 

89.99 

90.42 

30 

97.37 

105.75 

106.75 

88.70 

82.06 

95.76 

308.65 

264.45 

222.31 

165.95 

275.59 

285.19 

P 






A'" ^ 

- 3 






10 

19.50 

16.80 

15.94 

14.38 

15.77 

16.35 

36.63 

18.22 

17.82 

17.25 

24.47 

24.82 

20 

42.78 

44.65 

40.20 

44.48 

37.82 

41.41 

118.89 

50.64 

45.98 

43.02 

86.71 

87.32 

30 

99.68 

90.81 

82.79 

73.58 

74.62 

83.46 

297.35 

218.31 

198.30 

143.54 

260.92 

268.40 

P 






A^ 

- 5 






10 

12.24 

12.50 

11.49 

12.89 

12.51 

12.38 

24.72 

14.18 

13.39 

12.86 

17.75 

17.81 

20 

34.70 

28.45 

32.99 

34.51 

35.30 

32.62 

86.57 

41.94 

38.01 

35.64 

65.12 

65.35 

30 

71.34 

61.56 

73.58 

64.65 

78.40 

76.18 

234.67 

190.59 

156.13 

123.91 

212.46 

217.32 

P 






A^ - 

10 






10 

7.72 

7.14 

7.35 

7.88 

7.83 

8.10 

11.70 

8.85 

8.66 

8.33 

9.96 

9.99 

20 

19.24 

19.80 

18.82 

20.82 

20.12 

24.23 

42.53 

26.37 

25.72 

24.70 

36.54 

36.62 

30 

41.44 

43.91 

41.64 

44.13 

43.36 

54.70 

125.52 

102.55 

89.92 

80.92 

116.12 

117.89 

P 






- 

20 






10 

3.70 

3.93 

4.00 

4.08 

4.30 

4.33 

5.19 

4.50 

4.44 

4.34 

4.81 

4.82 

20 

9.77 

9.67 

9.90 

10.32 

10.62 

14.67 

19.13 

15.67 

15.46 

15.07 

17.95 

17.98 

30 

19.07 

20.16 

21.57 

22.40 

22.48 

37.61 

56.97 

54.06 

51.10 

47.37 

55.21 

55.64 

P 






A^ - 

30 






10 

2.99 

2.99 

3.19 

3.28 

3.30 

3.48 

3.93 

3.55 

3.54 

3.49 

3.70 

3.71 

20 

6.85 

7.27 

7.43 

7.55 

7.90 

10.44 

12.56 

11.00 

10.85 

10.61 

11.99 

12.03 

30 

14.29 

15.10 

15.41 

16.12 

16.41 

27.59 

36.70 

35.27 

34.41 

32.41 

36.34 

36.37 

P 






A^ - 

50 






10 

2.46 

2.51 

2.52 

2.61 

2.69 

2.70 

2.94 

2.75 

2.73 

2.71 

2.82 

2.83 

20 

5.53 

5.64 

5.62 

5.92 

6.08 

7.68 

8.74 

7.95 

7.84 

7.77 

8.50 

8.51 

30 

11.07 

11.86 

11.72 

12.27 

12.80 

18.75 

23.69 

22.71 

22.14 

21.81 

23.42 

23.44 


5 Real Data 

In this section we study the performance of the proposed LASSO-type estimators in a real 
example. We use the prostate dataset (Stamey et al, 1989). These data come from a study 
that examined the correlation between the level of prostate specific antigen and a number of 
clinical measures in men who were about to receive a radical prostatectomy. A descriptions of 
the variables in this dataset is given in Tableand Figure]^ shows the box-plot of the variables. 


Table 4: Discription of the variables of prostate data. 


Variables 

Description 

Remarks 

Ipsa 

Log of prostate specific antigen (PSA) 

Response 

Icavol 

Log cancer volume 


Iweight 

Log prostate weight 


age 

Age 


Age in years. Ibph 

Log of benign prostatic hyperplasia amount 


svi 

Seminal vesicle invasion 


Icp 

Log of capsular penetration 


gleason 

Gleason score 

A numeric vector 

pgg45 

Percent of Gleason scores 4 or 5 



First, we center the predictor variables. At the second step, we fit the linear regression 
model to predict the response variable in the presence of regressors. Unrestricted LASSO (UL), 
restricted LASSO (RL), preliminary test LASSO (PTL), Stein-type shrinkage LASSO (SL), 
and positive rule Stein-type shrinkage (PRL) estimators are used to estimate the regression 
parameters. 

The summary statistics of response variable (Ipsa) is shown in Table 

The performance of the estimators are evaluated using average 10-fold cross validation error. 
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Figure 3: The box plot of predictors in prostate data. 


Table 5: Summary statistics for response variable in the prostate dataset. 


Min 

Q1 

Median 

Q3 

Max 

Mean 

SD 

-0.4308 

1.7320 

2.5920 

3.0560 

5.5830 

2.4780 

1.1543 


/c-fold cross validation is a famous method that divide the data set into k equal-seized subset, 
randomly. One of the subsets is selected as test set and the k — 1 renaming subsets are called 
train set and used to fit the model. The obtained model is then used for predicting the response 
variable in the test set. Prediction errors is the squared version of difference between the observed 
and predicted values of the response variable in the test set. 

We used the following specihcations; 



■-1 

3 

1 

-1 

0 

-1 

0 

o' 


'o' 

H = 

-1 

1 

0 

-1 

0 

1 

0 

0 

, h = 

0 


1 

0 

-1 

1 

0 

0 

1 

0 


0 


By choosing 1000 as a large enough number for repeating process in a bootstrap simulation 
scheme, Table shows the average and standard deviation of the prediction errors. 


Table 6: 10-fold cross validation average prediction errors and standard deviations for prostate 


data 


ULE 

RLE 

PTLE(O.Ol) 

PTLE(0.05) 

PTLE(O.IO) 

SSLE 

PRSSLE 

mean 5822.99 

5753.18 

6122.17 

6076.60 

5906.45 

5798.20 

5797.90 

sd 92.56 

69.58 

251.41 

262.18 

213.98 

92.43 

92.54 


Based on Table PRSSLE is the best estimator in terms of the prediction error (the lesser 
risk, the better estimator). This estimator is followed by SSLE. If the level of significance a 
for constructing PTLE increases, then the prediction error decreases. These results confirm our 
assertions. 

Figure]^ shows the boxplot of average prediction errors for the proposed estimators visually, 
which demonstrates small prediction errors for the PRSSLE and SSLE. Although PRSSLE has 
the smallest prediction error, SSLE has the less variability of the five estimators. Variability of 
RLE is less than others, which means the null hypothesis is approximately true, here. 
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Figure 4: Boxplot of prediction errors of the unrestricted LASSO (UL), restricted LASSO (RL), 
preliminary-test LASSO (PTL) for a = 0.01,0.05 and 0.1,Stein-type shrinkage LASSO (SSL) 
and positive rule Stein-type shrinkage LASSO (PRSSL) estimators for prostate data. 

6 Conclusion 

In this paper, we proposed an improvement on the LASSO estimator by imposing a restriction 
to the model and using preliminary testing and shrinkage techniques. Indeed, we introduced 
preliminary-test LASSO (PRL), Stein-type shrinkage LASSO (SSL) and positive-rule shrinkage 
LASSO (PRSSL) estimators in the presence of a sub-space restriction. Performance of the 
proposed estimators under the null hypothesis has been studied in case of sample size (n) is 
more than the number of features (p). The proposed methodology for improving the LASSO 
can be applied in high dimensional situations, i.e., large p, small n. 

In addition to the given theorems for asymptotic behaviour of the proposed estimators an¬ 
alytically, using a simulation study, we compared the performance of estimators numerically 
for various configurations of parameter (p), correlation coefficient between the predictors (r), 
and the error in variance (<t^). In order to compare the improved estimators with the classical 
LASSO, we used the relative efficiency criterion. For different non-centrality parameter A^, 
degree of model misspecihcation, the number of non-zero /3s varied, and then the performance 
of estimators evaluated. We found out that the PRSSLE has the best performance among all. 
When we deviate from the null model, neither PTLE nor SLE dominates one another and the 
PTLE performs better as a gets larger. Relative efficiency of the proposed estimators increases 
when there are more near-zero parameters present in the model. Performance of the estimators 
decrease as we deviate from the null model. 

As an application, the prostate dataset analyzed. In this respect, 10-folded cross validation 
average and standard deviations of the prediction errors based on the LASSO , Restricted 
LASSO, preliminary test LASSO, Stein-type shrinkage LASSO and positive rule Stein type 
shrinkage LASSO compared. The new estimators dominate the LASSO one in average prediction 
error sense but the picture of dominance of the PRSSLE was not obvious. We conclude that 
the improved LASSO estimators are better than the classical version. 
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A Proof of Theorems 

A.l Proof of Theorem [T] 

For (i), see Knight and Fu (2000). To prove (ii), by Sluskey’s theorem, equation and also 
assumption A2, we know 

-h)^ argmin(Z) - C-^H^- h) 

(iii) By Eq. we have (3^ — (3^ = CnH ^— h), which converges to 

— h) again, by Sluskey’s theorem and also assumption A2 

(iv) Base on Theoremwe know /(L„ < Ln,a) ^ f(L < Lq). Now, by Eq. Q, part (iii) of 
this theorem and again by Sluskey’s theorem, we have 

^ argmin(Z) — C~^H"’"{HC~^{Haigmm{Z) — h)I{L < La) 

To prove (v) and (vi), consider that kn ^ k = q — 2, based on equation Q, part (iii) of this 
theorem and by Sluskey’s theorem, the result is obvious. 


A.2 Proof for Theorem 

Referring to Sen and Singer (1993) and Sluskey’s theorem, (i) is obvious. The proof of (ii) is in 
Knight and Eu (2000). For (iii), by equation Q, 

= -(3)- - P) + ^{Hp - h)] 

= V^{pI -P)- nC-^H^{nHC-^H^)-\HV^{Pn - P) + V^{HP - h)], 

Making use of Sluskey’s theorem and part (ii), we obtain 

\/n{Pn^ — P) -^d argmin(l/) — C~^H"’"{HC~^H^)~^[Haig-nim.{V) + ^]. 


To prove (iv), we have 



V^(^^ - p'f + C-^H^iHC-^H^)-HHpt - h)) 
VnC-^H^iHC-^H^)-\Hpt - h) 

-P) + MHP - h)) 

c-^h^{hc-^h^)-\hV^{pI:-P) + ^). 


The result is followed by Sluskey’s theorem and part (ii). 

Part (v) is an immediate consequence of Theorem ^ To prove (vi), by Theorem ® using the 

X> '—' '— PTL 

fact that I{Ln < En,o) —>• < F«)j parts (ii) and (iv) and Sluskey’s theorem, \/n{P — P) 

converges in distribution to 

argmin(V) — C~^H"’"[HC~^[HaigmmV + ^)I{L < La) 

To prove (vii), we have that 

y/nCP^^^ - P) = Vn{Pn - P)- knVniPn - Pn^)^n^ 

By parts (ii) and (iv), L~^ ^ L, kn ^ k and applying Sluskey’s theorem, we may write 

Vn{p^^^ - p) ^d argmin(V) - k {HC-^H^)-^{Haxgm.m{V) + 4)] L-^ (13) 
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And finally for proving (viii), we have 


^ _ /3) + + (1 - fc„L-i)/(L„ < kn)] V^CA - A"') 

T) '' PRSL 

In the same fashion as in (vii), and I{hn < i^n,a) —>• -I(L < Lq), the proof for y/n{0 — (3) is 

straight. 

A.3 Proof of Theorem |4] 

If An = o(n), then ^ 0, i.e. Aq = 0. We have the minimum of defined in part (i) of 

Theorem!^ as 

—Z((/>) = 2c(/) - 2c/3 = 0 

It concludes that (3^ ^ (3. Thus, argminZ((/>) = /3. Implementing this result in Theorem 
the proof is complete, since under the null hypothesis T-Lo, — h = 0 and all estimators are 
consistent. 

A.4 Proof of Theorem [5] 

In a similar fashion as in the proof of Theorem 7.8.2.3 of Saleh (2006), and using the fact that 
under y^-consistency, A = 0, argmin(P) = IT, all the given results are followed directly, after 
some algebra. 
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